Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

نویسندگان

  • Jaakko Hollmén
  • Jarkko Tikka
چکیده

Finite mixture models can be used in estimating complex, unknown probability distributions and also in clustering data. The parameters of the models form a complex representation and are not suitable for interpretation purposes as such. In this paper, we present a methodology to describe the finite mixture of multivariate Bernoulli distributions with a compact and understandable description. First, we cluster the data with the mixture model and subsequently extract the maximal frequent itemsets from the cluster-specific data sets. The mixture model is used to model the data set globally and the frequent itemsets model the marginal distributions of the partitioned data locally. We present the results in understandable terms that reflect the domain properties of the data. In our application of analyzing DNA copy number amplifications, the descriptions of amplification patterns are represented in nomenclature used in literature to report amplification patterns and generally used by domain experts in biology and medicine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Mixtures of Bernoulli Distributions

The mixture of Bernoulli distributions [6] is a technique that is frequently used for the modeling of binary random vectors. They differ from (restricted) Boltzmann Machines in that they do not model the marginal distribution over the binary data space X as a product of (conditional) Bernoulli distributions, but as a weighted sum of Bernoulli distributions. Despite the non-identifiability of th...

متن کامل

SPECIES OF MELILOTUS IN IRAN (KEY TO THE SPECIES, DESCRIPTIONS AND THEIR DISTRIBUTIONS)

The genus Melilotus contains altogether some 20 species (WIERSEMA et al. 1990) in the world which mostly distributed in Europe, Central and SW. Asia and N. Africa (HUCHINSON 1964). BOISSIER (1872) listed 13 species of Melilotus in which 6 of them are found in the Iranian flora. PARSA (1948) following Boissier’s work referring some new collections made by other workers, listed the same species n...

متن کامل

A risk adjusted self-starting Bernoulli CUSUM control chart with dynamic probability control limits

Usually, in monitoring schemes the nominal value of the process parameter is assumed known. However, this assumption is violated owing to costly sampling and lack of data particularly in healthcare systems. On the other hand, applying a fixed control limit for the risk-adjusted Bernoulli chart causes to a variable in-control average run length performance for patient populations with dissimilar...

متن کامل

On optimization, parallelization and convergence of the Expectation-Maximization algorithm for finite mixtures of Bernoulli distributions

This paper reviews the Maximum Likelihood estimation problem and its solution via the Expectation-Maximization algorithm. Emphasis is made on the description of finite mixtures of multi-variate Bernoulli distributions for modeling 0-1 data. General ideas about convergence and non-identifiability are presented. We discuss improvements to the algorithm and describe thoroughly what we believe are ...

متن کامل

Baseline Mixture Models for Social Networks

Continuous mixtures of distributions are widely employed in the statistical literature as models for phenomena with highly divergent outcomes; in particular, many familiar heavytailed distributions arise naturally as mixtures of light-tailed distributions (e.g., Gaussians), and play an important role in applications as diverse as modeling of extreme values and robust inference. In the case of s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007